# Multimodal Document Processing
Smoldocling 256M Preview Mlx Bf16 Docling Snap
This is a 256M-parameter preview version of a document understanding model, specifically designed for document structure parsing and content extraction tasks, supporting the conversion of image documents into structured data.
Image-to-Text
Transformers English

S
ds4sd
246
1
Udop Large 512 300k
MIT
UDOP is a universal document processing model that unifies vision, text, and layout, based on the T5 architecture, suitable for document AI tasks.
Image-to-Text
Transformers

U
microsoft
264
32
Udop Large 512
MIT
UDOP is a universal document processing model that unifies vision, text, and layout, based on the T5 architecture, suitable for tasks such as document image classification, parsing, and visual question answering.
Image-to-Text
Transformers

U
microsoft
193
5
Featured Recommended AI Models